AITopics | feedback delay

Collaborating Authors

feedback delay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

Neural Information Processing SystemsFeb-13-2026, 14:15:39 GMT

Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays {dt} that are unknown to the player. After picking arm at at round t, the player receives the cost of playing this arm dt rounds later. In cases where t + dt > T, this feedback is simply missing.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Decentralized Online Convex Optimization with Unknown Feedback Delays

Qiu, Hao, Zhang, Mengxiao, Achddou, Juliette

arXiv.org Machine LearningJan-14-2026

Decentralized online convex optimization (D-OCO), where multiple agents within a network collaboratively learn optimal decisions in real-time, arises naturally in applications such as federated learning, sensor networks, and multi-agent control. In this paper, we study D-OCO under unknown, time-and agent-varying feedback delays. While recent work has addressed this problem (Nguyen et al., 2024), existing algorithms assume prior knowledge of the total delay over agents and still suffer from suboptimal dependence on both the delay and network parameters. To overcome these limitations, we propose a novel algorithm that achieves an improved regret bound of O N $\sqrt$ d tot + N $\sqrt$ T (1-$σ$2) 1/4 , where T is the total horizon, d tot denotes the average total delay across agents, N is the number of agents, and 1 -$σ$ 2 is the spectral gap of the network. Our approach builds upon recent advances in D-OCO (Wan et al., 2024a), but crucially incorporates an adaptive learning rate mechanism via a decentralized communication protocol. This enables each agent to estimate delays locally using a gossip-based strategy without the prior knowledge of the total delay. We further extend our framework to the strongly convex setting and derive a sharper regret bound of O N $δ$max ln T $α$ , where $α$ is the strong convexity parameter and $δ$ max is the maximum number of missing observations averaged over agents. We also show that our upper bounds for both settings are tight up to logarithmic factors. Experimental results validate the effectiveness of our approach, showing improvements over existing benchmark algorithms.

artificial intelligence, equation, machine learning, (19 more...)

arXiv.org Machine Learning

2601.07901

Country: North America > United States (0.45)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Robust Learning for Smoothed Online Convex Optimization with Feedback Delay

Neural Information Processing SystemsDec-24-2025, 14:31:24 GMT

We study a general form of Smoothed Online Convex Optimization, a.k.a.

name change, robust learning, smoothed online convex optimization, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.61)

Add feedback

A Illustration of RCL

Neural Information Processing SystemsNov-15-2025, 05:57:01 GMT

We illustrate the online optimization process of RCL in Figure 1. We set b = 10 and A = I for the cost function in Eqn. The testing process is almost instant and takes less than 1 second. It does not use robustification during online optimization. By Theorem 4.1, there is a trade-off (governed by ML predictions for those problem instances that are adversarial to ROBD.

algorithm, competitive ratio, rcl, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)

Industry:

Transportation > Ground > Road (0.69)
Transportation > Electric Vehicle (0.69)
Automobiles & Trucks (0.69)
Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Robust Learning for Smoothed Online Convex Optimization with Feedback Delay

Neural Information Processing SystemsNov-15-2025, 05:56:58 GMT

We study a challenging form of Smoothed Online Convex Optimization, a.k.a.

algorithm, ml prediction, rcl, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Riverside County > Riverside (0.14)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Electric Vehicle (0.94)
Automobiles & Trucks (0.94)
Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

36848567d39a5128e671ad04a6075374-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 23:11:37 GMT

algorithm, apple, competitive ratio, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)

Industry:

Transportation > Ground > Road (0.69)
Transportation > Electric Vehicle (0.69)
Automobiles & Trucks (0.69)
Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

36848567d39a5128e671ad04a6075374-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 10:47:25 GMT

algorithm, ml prediction, rcl, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Riverside County > Riverside (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Electric Vehicle (0.94)
Automobiles & Trucks (0.94)
Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

Neural Information Processing SystemsAug-22-2025, 03:23:59 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, exp3, sequence, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)

Genre: Instructional Material > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robust Learning for Smoothed Online Convex Optimization with Feedback Delay

Neural Information Processing SystemsOct-11-2024, 04:37:01 GMT

We study a general form of Smoothed Online Convex Optimization, a.k.a. We propose a novel machine learning (ML) augmented online algorithm, Robustness-Constrained Learning (RCL), which combines untrusted ML predictions with a trusted expert online algorithm via constrained projection to robustify the ML prediction. Specifically, we prove that RCL is able to guarantee (1 \lambda) -competitiveness against any given expert for any \lambda 0, while also explicitly training the ML model in a robustification-aware manner to improve the average-case performance. Importantly, RCL is the first ML-augmented algorithm with a provable robustness guarantee in the case of multi-step switching cost and feedback delay. We demonstrate the improvement of RCL in both robustness and average performance using battery management as a case study.

feedback delay, robust learning, smoothed online convex optimization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback Delays

Chen, Ziqun, Cai, Kechao, Chen, Zhuoyue, Zhang, Jinbei, Lui, John C. S.

arXiv.org Machine LearningJul-29-2024

We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.

algorithm, fairness regret, reward regret, (14 more...)

arXiv.org Machine Learning

2407.15439

Country: